perf(grep): up to 14.5× speedup via preFilter extensions and matcher reuse#248
Merged
Conversation
Switch all UserRegex methods to acquireMatcher(input) to avoid per-call allocations, and propagate preFilter into searchContentMultiline so files with no needle are skipped without scanning every line.
…tions
- matchAll(): revert to fresh `_re2.matcher(input)`. As a generator that
suspends at yield, sharing the cached `_matcher` risks corruption if a
caller interleaves any other UserRegex method (test/exec/search/replace)
between two next() calls — acquireMatcher would reset/repoint the shared
matcher, breaking the in-progress iteration. All other synchronous
methods continue to use acquireMatcher.
- matcher.test.ts: replace two toContain assertions in the multiline
preFilter tests with full output equality (per AGENTS.md guidance), so
regressions in line numbering or group separators surface.
- grep.ts: add file-level preFilter check right after readFile — skips
searchContent (and the content.split("\n")) entirely when no needle
exists in the file. Handles countOnly correctly (emits "0\n" or
"filename:0\n" without entering the line loop).
- matcher.ts searchContentMultiline: fix preFilter early return to emit
"0\n"/"filename:0\n" in count-only mode instead of empty string.
- user-regex.ts replace() callback path: capture matcher.start(0) and
matcher.end(0) before invoking the callback — acquireMatcher mutates
charSequence in-place, so a re-entrant call would corrupt those reads.
|
@Hazzng is attempting to deploy a commit to the Vercel Labs Team on Vercel. A member of the Team first needs to authorize it. |
Contributor
Author
|
@cramforce i've made some code optimization for grep so its getting quite abit faster now compared to the last PR. Can you have a look please ? thanks |
cramforce
reviewed
May 24, 2026
The callback may re-enter the same UserRegex instance, which would route through acquireMatcher and repoint the shared matcher's charSequence, causing the next matcher.find(pos) to advance through the wrong input. Mirrors the matchAll() fix.
cramforce
approved these changes
May 26, 2026
This was referenced May 26, 2026
Merged
nunofgs
pushed a commit
to nunofgs/just-bash
that referenced
this pull request
Jun 9, 2026
…reuse (vercel-labs#248) * perf(grep): extract preFilter needles from anchored alternation (^lit1|^lit2) * perf(grep): reuse re2 matcher and add file-level preFilter fast-path Switch all UserRegex methods to acquireMatcher(input) to avoid per-call allocations, and propagate preFilter into searchContentMultiline so files with no needle are skipped without scanning every line. * fix(grep): address PR review — matchAll generator + full output assertions - matchAll(): revert to fresh `_re2.matcher(input)`. As a generator that suspends at yield, sharing the cached `_matcher` risks corruption if a caller interleaves any other UserRegex method (test/exec/search/replace) between two next() calls — acquireMatcher would reset/repoint the shared matcher, breaking the in-progress iteration. All other synchronous methods continue to use acquireMatcher. - matcher.test.ts: replace two toContain assertions in the multiline preFilter tests with full output equality (per AGENTS.md guidance), so regressions in line numbering or group separators surface. - grep.ts: add file-level preFilter check right after readFile — skips searchContent (and the content.split("\n")) entirely when no needle exists in the file. Handles countOnly correctly (emits "0\n" or "filename:0\n" without entering the line loop). - matcher.ts searchContentMultiline: fix preFilter early return to emit "0\n"/"filename:0\n" in count-only mode instead of empty string. - user-regex.ts replace() callback path: capture matcher.start(0) and matcher.end(0) before invoking the callback — acquireMatcher mutates charSequence in-place, so a re-entrant call would corrupt those reads. * chore: bump patch ver with changeset * fix(regex): use fresh matcher in replace() callback path The callback may re-enter the same UserRegex instance, which would route through acquireMatcher and repoint the shared matcher's charSequence, causing the next matcher.find(pos) to advance through the wrong input. Mirrors the matchAll() fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
regex.ts):extractPreFilternow strips leading^/ trailing$from each alternative before extracting the literal needle. Patterns like^def \|^async def(previously unoptimized) now get theString.indexOffast-path instead of running the full RE2 NFA against every line.user-regex.ts):match(),replace()(both string and callback paths),search(), andmatchAll()now route throughacquireMatcher(). Previously onlytest()andexec()used the cached matcher, leaving awk/sed hot-paths allocating a newRE2JS.Matcheron every call.matcher.ts):searchContentMultiline()now receivespreFilterand performs a whole-filecontent.includes(needle)check before splitting lines or invoking RE2 — files with no matching needle are rejected in O(n) string scan instead of O(n·m) NFA.grep.ts): afterreadFile, a file-level needle check now skipssearchContent(andcontent.split("\n")) entirely for files with no match. Count-only mode (-c) emits0\ncorrectly without entering the line loop.Benchmark results
100 files × 100 lines, 5 runs, median reported. Baseline:
just-bash@3.0.1(npm latest).^def |^async def(anchored BRE — the key case)def(simple literal — baseline)def |async def(unanchored alternation)The baseline itself dropped 2.3× because
acquireMatchernow coversmatch(),search(), andreplace()— reducing GC pressure for all regex operations, not just the grep hot path.Root cause (the
^def \|^async defcase)literalFromAlternativeinregex.tsrejected any alternative containing^or$, treating them as regex metacharacters. Since^Limplies the line containsL, andL$implies the line containsL, stripping outer anchors before needle extraction is provably sound (false positives are safe; RE2 re-checks the match). The fix is a 17-line anchor-strip inliteralFromAlternativebefore the existing character loop.Tests added
src/commands/search-engine/regex.test.tssrc/regex/user-regex.test.tsacquireMatcherreuse formatch,replace(string + callback),search,matchAll, 1000-call state leak checksrc/commands/search-engine/matcher.test.ts